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Abstract 

Background: Sjogren's syndrome is a tissue-specific autoimmune disease that affects exocrine tissues, especially 
salivary glands and lacrimal glands. Despite a large body of evidence gathered over the past 60 years, significant 
gaps still exist in our understanding of Sjogren's syndrome. The goal of this study was to develop a database that 
collects and organizes gene and protein expression data from the existing literature for comparative analysis with 
future gene expression and proteomic studies of Sjogren's syndrome. 

Description: To catalog the existing knowledge in the field, we used text mining to generate the Sjogren's 
Syndrome Knowledge Base (SSKB) of published gene/protein data, which were extracted from PubMed using text 
mining of over 7,700 abstracts and listing approximately 500 potential genes/proteins. The raw data were manually 
evaluated to remove duplicates and false-positives and assign gene names. The data base was manually curated to 
477 entries, including 377 potential functional genes, which were used for enrichment and pathway analysis using 
gene ontology and KEGG pathway analysis. 

Conclusions: The Sjogren's syndrome knowledge base (http://sskb.umn.edu) can form the foundation for an 
informed search of existing knowledge in the field as new potential therapeutic targets are identified by 
conventional or high throughput experimental techniques. 



Background 

Sjogren's syndrome is a tissue-specific autoimmune dis- 
ease that affects exocrine tissues, especially salivary 
glands and lacrimal glands. It is one of the most common 
autoimmune disorders in the U.S., with an estimated 
prevalence of 2-4 million people. The autoimmune- 
mediated damage of the salivary and lacrimal glands in 
Sjogren's syndrome leads to a decrease in the produc- 
tion of saliva and tears and to the development of dry 
mouth and dry eyes. Without the lubricating and pro- 
tective functions of saliva and tears, the oral and ocular 
surfaces are subject to infections and discomfort, lead- 
ing to a significantly reduced quality of life [1,2]. 

Development of Sjogren's syndrome requires a com- 
plex interplay between a number of genetic, hormonal 
and environmental factors, most of which have not been 
defined. Genetic linkages, especially involving major 
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histocompatibility complex (MHC) genes, have been 
reported for Sjogren's syndrome but it is not clear if, or 
how, the associated genes are involved in the develop- 
ment of the disease [3]. Additional non-MHC genes 
have also been linked with the development of Sjogren's 
syndrome. 

In addition to genetic predisposition, some studies 
suggest that infection of a genetically-susceptible indi- 
vidual by a virus or other pathogen might trigger the de- 
velopment of an autoimmune disease [4]. The proposed 
mechanisms include activation of the innate immune 
system, release of self antigens from damaged or apop- 
totic tissues, and molecular mimicry that results in acti- 
vation of T cells and/or B cells that react with tissue 
antigens [4]. 

Both the innate and the adaptive immune systems are 
involved in the pathogenesis of Sjogren's syndrome. The 
type I interferon (IFN) pathway, which plays an import- 
ant role in the innate immune response to viruses, is 
also thought to play an important role in the develop- 
ment of Sjogren's syndrome and other autoimmune dis- 
orders, including SLE [5,6]. Moreover, type I IFNs can 
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activate the adaptive immune system directly, by binding 
to IFN receptors on antigen presenting cells, T cells and 
B cells, or indirectly, by inducing the production and re- 
lease of cytokines and chemokines that bind to these 
cells. 

Autoantibodies to intracellular antigens, notably the 
nuclear proteins SSA/Ro and SSB/La, are found in the 
sera of many patients with Sjogren's syndrome. These 
autoantibodies are thought to develop when intracellular 
antigens, some of which have undergone proteolytic 
cleavage that reveals new antigenic epitopes, become 
"visible" to the immune system in membrane blebs on 
the surface of apoptotic cells [7]. Alternatively, antigenic 
epitopes from bacteria and viruses, including Epstein- 
Barr virus (EBV) and coxsackie virus, may act as 
molecular mimics that trigger the development of anti- 
bodies that cross react with similar epitopes on target 
tissue autoantigens [2,8,9]. Although autoantibodies to 
intracellular antigens are useful in the diagnosis of 
Sjogren's syndrome, it is not clear if they play a direct 
role in the development of salivary gland and lacrimal 
gland damage and hypofunction. In contrast, autoanti- 
bodies to the M3 muscarinic acetylcholine receptor 
(M3R) have been directly implicated in salivary gland 
hypofunction in the nonobese diabetic (NOD) mouse 
model of Sjogren's syndrome [10]. Importantly, function- 
inhibiting anti-M3R autoantibodies are found in the sera 
of many patients with Sjogren's syndrome [11]. 

Current therapy for Sjogren's syndrome usually con- 
sists of palliative treatment that relieves the symptoms 
of dry eye and dry mouth, but fails to modify the under- 
lying disease. Novel disease-modifying treatment strat- 
egies, based on recent immunological insights in 
Sjogren's syndrome and other autoimmune diseases, 
have met with mixed results [12]. For example, in recent 
clinical trials, treatment of Sjogren's syndrome patients 
with a B cell-depleting anti-CD20 monoclonal antibody 
(rituximab) led to significant improvement of the stimu- 
lated whole saliva flow rate and a reduction in parotid 
gland inflammation [13]. In contrast, TNFoc inhibitors 
have been ineffective in the treatment of Sjogren's syn- 
drome. Detailed studies on the immune response in 
Sjogren's syndrome patients treated with one of the inhi- 
bitors (etanercept) revealed an increase in the circulating 
levels of TNFa [14]. These results suggest that TNFa 
may not play a pivotal role in the disease and that other 
therapeutic targets must be identified. 

Despite a large body of evidence gathered over the 
past 60 years, significant gaps still exist in our under- 
standing of Sjogren's syndrome. Recent gene expression 
and proteomic studies have identified many genes and 
pathways that may play a role in the pathogenesis of 
Sjogren's syndrome [15-17]. However, validation of these 
data will require significant additional effort. As an 



initial step in this validation, we have compiled the pub- 
lished data on Sjogren's syndrome that is not derived 
from gene expression or proteomic studies. No such 
unifying database currently exists. Through data cur- 
ation, the existing data have been uniformly formatted 
to allow systematic retrieval and comparisons to newly 
generated gene expression data. As an example of its 
functionality, the Sjogren's Syndrome Knowledge Base 
(SSKB) was analyzed for biological functions and path- 
ways that are likely to play a role in the disease. 

Construction and content 

Data mining 

To catalog the existing knowledge in the field, we used 
text mining to generate the Sjogren's Syndrome Know- 
ledge Base (SSKB) of published gene/protein data 
(http://sskb.umn.edu/) [18]. The focus of this data-base 
is on individually identified genes and proteins. Thus, 
microarray experiments were not included. The raw data 
for SSKB was extracted from PubMed [19]) using the 
text mining program EBIMed (http://www.ebi.ac.uk/ 
Rebholz-srv/ebimed/) [20] with the search term "Sjo- 
gren's Syndrome" restricted to "MeshHeadingsList". The 
foundational search identified over 7,700 abstracts and 
approximately 500 potential genes/proteins. The SSKB is 
continually updated by regular automated searches of 
PubMed followed by manual curation. 

Curation of raw data 

The identified abstracts were manually evaluated to re- 
move duplicates and false-positives. In older publica- 
tions, where gene names were not readily identifiable, 
names were assigned based on in depth evaluation of 
the protein name context and available gene data in 
public databases, including the National Center for Bio- 
technology Information's Entrez search engine [21] and 
UniProt [22,23]. The SSKB includes data from human 
studies and animal models. For the genes identified in 
animal models, the human homolog was identified by 
automated ortholog search, using WebGestalt 2.0 
[24,25]. These steps reduced the database to 477 current 
entries. The online database contains the fully curated 
data and currently contains 413 entries, which can be 
accessed at http://sskb.umn.edu. Updates and newly 
curated data are continually added. 

The 477 entries were sorted to identify autoantigens 
and viral/bacterial antigens, resulting in 377 potential 
functional genes, which were used for enrichment and 
pathway analysis. 

Enrichment analysis 

The 377 human gene entries were used for subsequent 
enrichment analyses in Webgestalt [24,25]. Gene enrich- 
ment in the SSKB gene set was compared to the human 
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genome using the hypergeometric test with multiple test 
adjustment [26] and a significance level of P <0.01. 

The Gene Ontology [27,28] was accessed with Web- 
gestalt and analysis was restricted to processes and func- 
tions represented by two or more genes. Pathway 
analysis was performed with Webgestalt in the Kyoto 
Encyclopedia of Genes and Genomes (KEGG) [29,30]. 
The selection was restricted to pathways with 4 or more 
genes represented, resulting in identification of 72 
KEGG pathways. The "salivary secretion" pathway 
(KO04970) was recently added to KEGG (11/9/10) and 
was not included in this analysis. This pathway contains 
59 genes, seven of which are found in the SSKB gene 
set. 

Utility and discussion 

We constructed a database containing proteins and 
genes associated with Sjogren's syndrome in human dis- 
ease or animal models, as identified by text mining of 
published data. The public SSKB currently contains 413 
genes/proteins and can be viewed online (http://sskb. 
umn.edu/). All genes have been assigned gene symbols 
and UniProt IDs, which allows rapid retrieval of gene- 
specific data from external databases. The SSKB data 
base can be used to determine whether a list of genes is 
enriched with known Sjogren's syndrome genes and one 
can carry out a function enrichment analysis (hypergeo- 
metric distribution). Individual genes and the corre- 
sponding gene products, synonyms and alternate names 
can be searched by using a web browser search function. 
Autoantigens, viral antigens and bacterial antigens are 
separately identified under "Antigens". The SSKB is con- 
tinually maintained and updated and new genes are 
added as their analysis is completed. 

Based on the abstracts used to retrieve the SSKB 
genes/proteins, 85 proteins were initially characterized 
as autoantigens and 15 proteins were characterized as 
viral (14) or bacterial (1) antigens. Not surprisingly, 
SSA/Ro and SSB/La were among the most frequently 
retrieved autoantigens. It has been proposed that viral or 
bacterial antigens act as autoimmune triggers by mo- 
lecular mimicry of endogenous human proteins [2,8,9]. 
However, eight of the 14 putative viral antigens in SSKB 
were selected for BLAST analysis, which did not identify 
strong sequence similarity with human proteins (not 
shown). 

The 377 proteins not identified as autoantigens or mi- 
crobial antigens were considered candidates for func- 
tional genes that could play a role in the initiation and 
progression of Sjogren's syndrome. Since the gene list 
contains data from humans and animals, the corre- 
sponding human genes were identified, with the assump- 
tion that genes identified in animal models of Sjogren's 
syndrome may also be involved in the human disease. 



Gene ontology 

The Gene Ontology database [27] was queried to iden- 
tify the biological processes, cellular components and 
molecular functions associated with genes in the SSKB 
(Table 1). The 40 most highly enriched entries were 
identified in each category. 

The most highly enriched biological processes (19 of 
40; 18 of the top 20) were associated with immune func- 
tion, including leukocyte proliferation, leukocyte activa- 
tion, and regulation of the immune response. Other 
prominent biological processes were associated with 
apoptosis and cell death. Thus, the SSKB data set is con- 
sistent with recent microarray data [16] and reflects 
current models for the biological processes involved in 
the pathogenesis of Sjogren's syndrome [5,31,32]. 

The most highly enriched cellular component was the 
calcineurin complex, which plays a major role in the ac- 
tivation of T cells. Interestingly, in placebo-controlled 
clinical trials, treatment of Sjogren's syndrome patients 
with eye drops that contain the calcineurin inhibitor 
cyclosporine, led to significant improvement in several 
of the signs and symptoms of dry eye [33]. 

Other highly enriched cellular components include: 1) 
platelet alpha granules. Although platelet activation has 
been reported in the salivary glands of Sjogren's syndrome 
patients [34], a direct search of PubMed for "platelet alpha 
granules" with "Sjogren's" did not retrieve any published 
studies. Thus, while the proteins identified were retrieved 
from the literature, their potential association with platelet 
alpha granules in Sjogren's syndrome has not previously 
been noted. 2) MHC protein complexes were identified 
and are presumably involved in the presentation of auto- 
antigens [16]. 3) The finding that protein-lipid complexes 
and lipoprotein particles are associated with Sjogren's syn- 
drome may be consistent with changes in serum lipid 
levels in Sjogren's syndrome patients [35] although the 
prevalence of anti-phospholipid antibodies is low in Sjo- 
gren's syndrome [36]. 4) Nerve terminals and axons were 
also prominent cellular components, consistent with the 
known neurological component of Sjogren's syndrome 
[37]. 

In molecular function, nitric oxide synthase (NOS) ac- 
tivity was the most highly enriched, although only three 
genes (NOS1-3) were identified. Nitric oxide (NO) sig- 
naling appears to be directly affected in salivary and lac- 
rimal glands in Sjogren's syndrome [38]. Other highly 
enriched molecular functions include chemokine and 
cytokine activity/receptor binding (8 of the top 15) and 
peptidase activities. 

Pathway analysis 

The SSKB gene list was submitted to KEGG [29] to 
identify biological pathways potentially associated with 
Sjogren's syndrome. A total of 72 KEGG pathways 
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Table 1 Gene Ontology enrichment analysis 



Rank 


BIOLOGICAL PROCESS 


GO ID 


Reference Genes 


Observed Genes 


Ratio 


1 


regulation of lymphocyte proliferation 


GO:0050670 


81 


32 


39.51% 


2 


regulation of leukocyte proliferation 


GO:0070663 


82 


32 


39.02% 


3 


regulation of mononuclear cell proliferation 


GO:0032944 


82 


32 


39.02% 


4 


adaptive immune response based on somatic 
recombination of immune receptors built from 
immunoglobulin superfamily domains 


GO:0002460 


112 


38 


33.93% 


5 


adaptive immune response 


GO:0002250 


113 


38 


33.63% 


6 


lymphocyte proliferation 


GO:0046651 


112 


37 


33.04% 


/ 


leukocyte proliferation 


GO:0070661 


114 


37 


32.46% 


8 


mononuclear cell proliferation 


GO:0032943 


114 


37 


32.46% 


9 


regulation of lymphocyte activation 


GO:0051249 


141 


42 


29.79% 


10 


regulation of cell activation 


GO:0050865 


168 


46 


27.38% 


11 


regulation of leukocyte activation 


GO:0002694 


159 


43 


27.04% 


12 


positive regulation of immune system process 


GO:0002684 


229 


60 


26.20% 


13 


regulation of immune response 


GO:0050776 


218 


54 


24.77% 


14 


immune effector process 


GO:0002252 


200 


45 


22.50% 


15 


regulation of immune system process 


GO:0002682 


362 


79 


21.82% 


16 


lymphocyte activation 


GO:0046649 


272 


59 


21.69% 


17 


leukocyte activation 


GO:0045321 


324 


66 


20.37% 


18 


inflammatory response 


GO:0006954 


359 


/I 


1 9.78% 


19 


cell activation 


GO:0001775 


366 


/I 


1 9.40% 


20 


immune response 


GO:0006955 


750 


133 


1 7.73% 


21 


regulation of response to stimulus 


GO:0048583 


441 


75 


17.01% 


22 


defense response 


GO:0006952 


657 


100 


1 5.22% 


23 


immune system process 


GO:0002376 


1066 


162 


1 5.20% 


24 


response to wounding 


GO:000961 1 


560 


85 


15.18% 


25 


response to external stimulus 


GO:0009605 


904 


110 


12.17% 


26 


multi-organism process 


GO:0051704 


668 


79 


1 1 .83% 


27 


regulation of programmed cell death 


GO:0043067 


812 


92 


1 1 .33% 


28 


regulation of apoptosis 


GO:0042981 


805 


91 


1 1 .30% 


29 


regulation of cell death 


GO:0010941 


815 


92 


1 1 .29% 


30 


regulation of cell proliferation 


GO:0042127 


739 


79 


1 0.69% 


31 


apoptosis 


GO:0006915 


1063 


102 


9.60% 


32 


programmed cell death 


GO:0012501 


1071 


102 


9.52% 


33 


response to chemical stimulus 


GO:0042221 


1243 


11/ 


9.41% 


34 


cell proliferation 


GO:0008283 


1056 


98 


9.28% 


35 


death 


GO:0016265 


1171 


107 


9.14% 


36 


cell death 


GO:0008219 


1167 


106 


9.08% 


37 


response to stress 


GO:0006950 


1696 


144 


8.49% 


38 


positive regulation of biological process 


GO:0048518 


1865 


153 


8.20% 


39 


positive regulation of cellular process 


GO:0048522 


1699 


130 


7.65% 


40 


response to stimulus 


GO:0050896 


3471 


221 


6.37% 


Rank 


CELLULAR COMPONENT 


GO ID 


Count 


Observed 


Ratio 


1 


calcineurin complex 


GO:0005955 


5 


3 


60.00% 


2 


external side of plasma membrane 


GO:0009897 


131 


40 


30.53% 
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Table 1 Gene Ontology enrichment analysis (Continued) 
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Rank 


MOLECULAR FUNCTION 


GO ID 


COUNT 


Observed 


RATIO 


1 


arginine binding 


GO:0034618 


3 


3 


100.00% 


2 


nitric-oxide synthase activity 


GO:0004517 


3 


3 


100.00% 


3 


tetrahydrobiopterin binding 


GO:0034617 


3 


3 


100.00% 


4 


C-X-C chemokine binding 


GO:0019958 


8 


4 


50.00% 


5 


beta-amyloid binding 


GO:0001540 


13 


5 


38.46% 


6 


tumor necrosis factor receptor binding 


GO:0005164 


21 


8 


38.10% 
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Table 1 Gene Ontology enrichment analysis (Continued) 



7 


chemokine activity 


GO:0008009 


47 


17 


36.17% 


8 


chemokine receptor binding 


GO:0042379 


49 


1/ 


34.69% 


9 


coreceptor activity 


GO:0015026 


19 


6 


31.58% 


10 


tumor necrosis factor receptor superfamily binding 


GO:0032813 


31 


9 


29.03% 


1 1 


cytokine receptor binding 


GO:0005126 


178 


46 


25.84% 


12 


chemokine binding 


GO:0019956 


26 


6 


23.08% 


13 


cytokine activity 


GO:0005125 


196 


'15 


22.96% 


14 


growth factor receptor binding 


GO:0070851 


67 


14 


20.90% 


15 


collagen binding 


GO:0005518 


35 


/ 


20.00% 


16 


G-protein-coupled receptor binding 


GO:0001664 


107 


20 


1 8.69% 


17 


integrin binding 


GO:0005178 


58 


9 


1 5.52% 


18 


cysteine-type endopeptidase activity 


GO:00041 97 


/I 


10 


14.08% 


19 


growth factor activity 


GO:0008083 


161 


19 


1 1 .80% 


20 


cytokine binding 


GO:0019955 


108 


12 


1 1.1 1% 


21 


protein heterodimerization activity 


GO:0046982 


189 


21 


11.11% 


22 


glycosaminoglycan binding 


GO:0005539 


139 


14 


1 0.07% 


23 


protein complex binding 


GO:0032403 


196 


19 


9.69% 


24 


receptor binding 


GO:0005102 


856 


83 


9.70% 


25 


receptor signaling protein activity 


GO:0005057 


159 


15 


9.43% 


26 


pattern binding 


GO:0001871 


153 


14 


9.15% 


27 


peptidase inhibitor activity 


GO:0030414 


154 


14 


9.09% 


28 


carbohydrate binding 


GO:0030246 


349 


29 


8.31% 


29 


endopeptidase activity 


GO:00041 75 


370 


28 


7.57% 


30 


polysaccharide binding 


GO:0030247 


153 


14 


9.15% 


31 


protein dimerization activity 


GO:0046983 


514 


36 


7.00% 


32 


identical protein binding 


GO:0042802 


618 


38 


6.15% 


33 


enzyme binding 


GO:0019899 


505 


29 


5.74% 


34 


peptidase activity 


GO:0008233 


563 


30 


5.33% 


35 


peptidase activity, acting on L-amino acid peptides 


GO:007001 1 


546 


29 


5.31% 


36 


molecular transducer activity 


GO:0060089 


2116 


98 


4.63% 


37 


signal transducer activity 


GO:0004871 


2116 


98 


4.63% 


38 


receptor activity 


GO:0004872 


1674 


/I 


4.24% 


39 


protein binding 


GO:0005515 


8041 


280 


3.48% 


40 


binding 


GO:0005488 


12465 


320 


2.57% 



The table ranks the gene enrichment in biological processes, cellular component and molecular function with corresponding GO IDs. For each GO ID, the number of 
Observed Genes identified in the SSKB was divided by the number of Reference Genes in the human genome to calculate the Ratio of enrichment (Ratio). 



showed highly significant enrichment (P <0.001) in this 
analysis (Table 2). 

The pathway analysis revealed dominant pathways asso- 
ciated with immune regulation. Indeed, the eight most 
highly enriched pathways were associated with antigen 
presenting cells and activation of T cells and B cells. 

Several cancer associated pathways were identified. 
This is partly due to the overlap between cancer path- 
ways. These pathways typically include cytokine or 



growth factor stimulation of cell cycle and cell death and 
were not further analyzed. 

Pathways associated with apoptosis, cytokine signaling 
and inflammation were also highly enriched. To focus on 
the events associated with initiation of Sjogren's syn- 
drome, we analyzed pathways with known triggers. Several 
of the highly enriched pathways are triggered by bacterial 
toxins, viral DNA, or viral RNA. These include signaling 
pathways for Toll-like receptor, NOD-like receptor, RIG-I- 
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Table 2 Biological pathways associated with SSKB genes 



Rank 


PATHWAY 


SSKB Genes 


ENRICHMENT 


Raw P 


Adjust P 


1 


Allograft rejection 


23 


76.02 


3.62E-39 


6.82E-38 


2 


ntestinal immune network for IgA production 


27 


67.82 


7.26E-44 


2.05E-42 


3 


Asthma 


14 


58.61 


4.14E-22 


2.75E-21 


4 


Type 1 diabetes mellitus 


20 


57.09 


9.13E-31 


9.38E-30 


5 


Graft-versus-host disease 


18 


53.83 


3.21 E-27 


2.79E-26 


6 


Autoimmune thyroid disease 


22 


52.13 


1.29E-32 


1.82E-31 


/ 


Primary immunodeficiency 


14 


50.24 


6.38E-21 


3.79E-20 


8 


Hematopoietic cell lineage 


33 


4/.1 


1 .39E-46 


5.24E-45 


9 


Toll-like receptor signaling pathway 


37 


46.01 


1.13E-51 


6.38E-50 


10 


Apoptosis 


25 


35.68 


5.55E-32 


6.97E-31 


1 1 


NOD-like receptor signaling pathway 


17 


34.44 


7.61 E-22 


4.78E-21 


12 


Amyotrophic lateral sclerosis (ALS) 


14 


33.18 


5.81 E-1 8 


2.85E-17 


13 


Other glycan degradation 


4 


31.4 


6.67E-06 


1 .24E-05 


14 


Cytokine-cytokine receptor interaction 


66 


31.05 


5.91 E-79 


6.68E-77 


15 


T cell receptor signaling pathway 


26 


30.24 


4.12E-31 


4.66E-30 


16 


RIG-l-like receptor signaling pathway 


17 


30.07 


9.98E-21 


5.64E-20 


17 


Cell adhesion molecules (CAMs) 


32 


29.99 


6.40E-38 


1.03E-36 


18 


Bladder cancer 


10 


29.9 


1.06E-12 


3.24E-12 


19 


Viral myocarditis 


17 


29.25 


1 .68E-20 


9.04E-20 


20 


Cytosolic DNA-sensing pathway 


13 


29.16 


5.78E-16 


2.42E-15 


21 


Pancreatic cancer 


15 


26.17 


1.88E-17 


8.50E-17 


22 


Small cell lung cancer 


16 


23.92 


7.32E-18 


3.45E-17 


23 


Glycosaminoglycan degradation 


4 


23.92 


2.13E-05 


3.65E-05 


24 


Natural killer cell mediated cytotoxicity 


25 


22.92 


1 .06E-26 


8.56E-26 


25 


ErbB signaling pathway 


13 


22.16 


2.51 E-1 3 


8.86E-13 


26 


Epithelial cell signaling in Helicobacter pylori infection 


12 


22.16 


2.64E-13 


9.04E-1 3 


27 


Complement and coagulation cascades 


12 


21.84 


3.17E-13 


1.05E-12 


28 


B cell receptor signaling pathway 


13 


21.77 


3.38E-14 


1.23E-13 


29 


Prion diseases 


6 


21.53 


3.27E-07 


6.84E-07 


30 


Antigen processing and presentation 


15 


21.17 


5.49E-16 


2.39E-15 


31 


Colorectal cancer 


14 


20.93 


6.14E-15 


2.48E-14 


32 


Adipocytokine signaling pathway 


1 1 


20.62 


6.05 E-1 2 


1.80E-11 


33 


Chemokine signaling pathway 


30 


19.83 


7.80E-30 


7.35E-29 


34 


Prostate cancer 


14 


19.76 


1.42E-14 


5.53E-14 


35 


Glioma 


10 


19.32 


1.10E-10 


2.89E-10 


36 


Jak-STAT signaling pathway 


23 


18.64 


1 .67E-22 


1.18E-21 


37 


Non-sma cell lung cancer 


8 


18.61 


1.13E-08 


2.50E-08 


38 


Melanoma 


10 


17.69 


2.71 E-1 0 


6.96E-10 


39 


Pathways in cancer 


46 


17.51 


9.85 E-43 


2.23E-41 


40 


Fc epsilon Rl signaling pathway 


1 1 


17.49 


3.90E-1 1 


1.05E-10 


41 


Chronic myeloid leukemia 


10 


16.75 


4.74E-10 


1.19E-09 


42 


GnRH signaling pathway 


12 


14.92 


3.42E-1 1 


9.43 E-1 1 


43 


Leukocyte transendothelial migration 


14 


14.9 


7.91 E-1 3 


2.48E-12 


44 


VEGF signaling pathway 


9 


14.87 


1 .04E-08 


2.35E-08 
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Table 2 Biological pathways associated with SSKB genes (Continued) 



45 


Hypertrophic cardiomyopathy (HCM) 


10 


14.78 


1 .67E-09 


4.10E-09 


46 


p53 signaling pathway 


8 


14.56 


8.19E-08 


1.75E-07 


47 


Endometrial cancer 


6 


14.49 


3.65E-06 


7.11 E-06 


48 


Systemic lupus erythematosus 


16 


14.35 


3.27E-14 


1.23E-13 


49 


MAPK signaling pathway 


30 


14.01 


3.15E-25 


2.37E-24 


50 


Focal adhesion 


22 


13.75 


1.21 E-1 8 


6.21 E-1 8 


51 


Dilated cardiomyopathy 


10 


13.65 


3.66E-09 


8.44E-09 


52 


Type II diabetes mellitus 


5 


13.36 


3.63E-05 


6.12E-05 


53 


Neurotrophin signaling pathway 


13 


12.96 


3.17E-11 


8.96E-1 1 


54 


ECM-receptor interaction 


8 


11.96 


3.85E-07 


7.91E-07 


55 


Alzheimer's disease 


16 


1 1.89 


6.32E-13 


2.04E-1 2 


56 


Lysosome 


11 


11.81 


2.86E-09 


6.73 E-09 


5/ 


Arginine and proline metabolism 


5 


1 1.63 


7.15E-05 


0.0001 


58 


Renal cell carcinoma 


6 


10.77 


2.09E-05 


3.63E-05 


59 


Long-term depression 


6 


10.77 


2.09E-05 


3.63E-05 


60 


Long-term potentiation 


6 


10.77 


2.09E-05 


3.63E-05 


61 


Proteasome 


4 


10.47 


0.0006 


0.0009 


62 


Progesterone-mediated oocyte maturation 


/ 


10.22 


6.00E-06 


1.15E-05 


63 


TGF-beta signaling pathway 


/ 


10.11 


6.48E-06 


1 .22E-05 


64 


Regulation of actin cytoskeleton 


16 


9.3 


2.69E-1 1 


7.79E-1 1 


65 


Calcium signaling pathway 


13 


9.17 


2.36E-09 


5.67E-09 


66 


Wnt signaling pathway 


1 1 


9.15 


4.17E-08 


9.06E-08 


67 


Gap junction 


6 


8.37 


8.67E-05 


0.0001 


68 


Cell cycle 


8 


7.85 


9.32E-06 


1 .70E-05 


69 


Oocyte meiosis 


/ 


/./I 


3.80E-05 


6.31 E-05 


70 


Axon guidance 


/ 


6.82 


8.33E-05 


0.0001 


71 


Endocytosis 


10 


6.72 


2.93 E-06 


5.81 E-06 


72 


Metabolic pathways 


26 


2.96 


1.12E-06 


2.26E-06 



The table lists the number of SSKB genes associated with individual KEGG pathways. The pathways are ranked according to their Enrichment relative to the number of 
reference genes in the human genome based on the hypergeometric test. The raw P-values (hypergeometric test) and the multiple test-adjusted P-vaiues are listed for 
each pathway. 



like receptor signaling pathways and the cytosolic DNA- 
sensing pathway. 

Overlap with other autoimmune diseases 

The KEGG pathways include several pathways for auto- 
immune diseases, including type I diabetes mellitus, 
autoimmune thyroid disease, and SLE. While about 50% 
of the genes associated with the first two pathways are 
also associated with Sjogren's syndrome, only 16 Sjo- 
gren's syndrome genes were identified in the 140-gene 
SLE pathway (KEGG ID: hsa05322). These findings sug- 
gest that significant differences exist in the pathogenesis 
of autoimmune diseases. 

Conclusions 

The results of this analysis can serve as a background 
and comparison for the increasing number of gene 



expression data sets available for Sjogren's syndrome, 
e.g. [15-17]. Preliminary analysis of such data sets sug- 
gest that the biological pathways identified in the SSKB 
are very similar to those identified in human parotid tissue 
but quite different from those identified in human labial 
salivary glands [15]. Future analyses will further define 
these differences and focus on the comparison of bio- 
logical pathways identified in human tissues and mouse 
models of Sjogren's syndrome. It is envisioned that the 
SSKB data can also serve as the starting point for literature 
reviews and literature-based validation of identified genes; 
functional gene enrichment studies; protein-protein inter- 
action networks and other bioinformatics analyses; it can 
be used to arrive at gene sets for SNP set enrichment ana- 
lysis (pathway based GWAS studies); it can be used to 
define a gene set for gene set enrichment analysis 
(GSEA); as a starting point for bioinformatics analysis 
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protein-protein interaction networks (based on yeast 
2 hybrid) can be identified among the SSKB genes. 

Availability and requirements 

The Sjogren's syndrome knowledge base is freely avail- 
able at sskb.umn.edu. 
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